Combining Bilingual Terminology Mining and Morphological Modeling for Domain Adaptation in SMT
نویسندگان
چکیده
Translating in technical domains is a wellknown problem in SMT, as the lack of parallel documents causes significant problems of sparsity. We discuss and compare different strategies for enriching SMT systems built on general domain data with bilingual terminology mined from comparable corpora. In particular, we focus on the targetlanguage inflection of the terminology data and present a pipeline that can generate previously unseen inflected forms.
منابع مشابه
Latent Structure Discriminative Learning for Natural Language Processing
Natural language is rich with layers of implicit structure, and previous research has shown that we can take advantage of this structure to make more accurate models. Most attempts to utilize forms of implicit natural language structure for natural language processing tasks have assumed a pre-defined structural analysis before training the task-specific model. However, rather than fixing the la...
متن کاملEnhancing Statistical Machine Translation with Bilingual Terminology in a CAT Environment
In this paper, we address the problem of extracting and integrating bilingual terminology into a Statistical Machine Translation (SMT) system for a Computer Aided Translation (CAT) tool scenario. We develop a framework that, taking as input a small amount of parallel in-domain data, gathers domain-specific bilingual terms and injects them in an SMT system to enhance the translation productivity...
متن کاملStatistical Machine Translation with Terminology
This paper considers a scenario which is slightly different from Statistical Machine Translation (SMT) in that we are given almost perfect knowledge about bilingual terminology, considering the situation when a Japanese patent is applied to or granted by the Japanese Patent Office (JPO). Technically, we incorporate bilingual terminology into Phrase-based SMT (PB-SMT) focusing on the statistical...
متن کاملIdentification of Bilingual Terms from Monolingual Documents for Statistical Machine Translation
The automatic translation of domain-specific documents is often a hard task for generic Statistical Machine Translation (SMT) systems, which are not able to correctly translate the large number of terms encountered in the text. In this paper, we address the problems of automatic identification of bilingual terminology using Wikipedia as a lexical resource, and its integration into an SMT system...
متن کاملLearning a Phrase-based Translation Model from Monolingual Data with Application to Domain Adaptation
Currently, almost all of the statistical machine translation (SMT) models are trained with the parallel corpora in some specific domains. However, when it comes to a language pair or a different domain without any bilingual resources, the traditional SMT loses its power. Recently, some research works study the unsupervised SMT for inducing a simple word-based translation model from the monoling...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014